bump zarr conventions; migrate type checking to pyright by d-v-b · Pull Request #199 · EOPF-Explorer/data-model

d-v-b · 2026-06-19T13:05:56Z

Summary

Updates the data model to the latest GeoZarr-relevant Zarr conventions (via the zarr-cm package) and overhauls the project's type-checking setup.

The work landed in two parts:

1. zarr-cm upgrade

Build all convention metadata (multiscales / spatial / proj) through zarr_cm.create_many via a single utils.build_convention_attrs helper, so zarr-cm validates each convention and emits its CMO — instead of hand-assembling zarr_conventions and the spatial:/proj: keys.
Track the latest zarr-cm (currently the main git dependency, which ships: a Mapping-covariant aggregate API, public JsonType/JsonValue/JsonDict exports, and a PEP 695 type JsonValue alias). The PEP 695 alias lets pydantic resolve MultiscaleGroupAttrs' ConventionMetadataObject field directly — no model_rebuild() workaround.
Regenerated the golden-file snapshots whose embedded convention schema_url/spec_url changed with the new convention releases (URL-only diffs).

2. mypy → pyright migration

zarr-cm's convention TypedDicts now use PEP 728 extra_items, which mypy does not support. The type checker is switched to pyright (which does):

pyproject.toml: [tool.mypy] → [tool.pyright]; mypy → pyright dev dependency.
pre-commit + CI lint job run uv run --frozen pyright.

Reaching a clean pyright run (0 errors across src/ + tests/) involved:

removing stale mypy # type: ignore[...] comments;
narrowing NotRequired TypedDict access with .get() + guards instead of making keys Required (preserving runtime validation of genuinely-optional Sentinel-1/2 members);
replacing casts on external/untyped data with real runtime checks (CRS.from_user_input, a _as_bbox validator, defensive get_zarr_group resolution);
constructing models via Model.model_validate(...) rather than **dict.

No typing.Any is used in the changed code. Runtime behavior is preserved.

Verification

pyright: 0 errors (src + tests).
ruff check + ruff format: clean.
Test suite green (incl. Sentinel-1/2 round-trip and golden-file snapshot tests).
roborev review (job 114): Pass — no issues found.

Upstream

Filed (and merged) several zarr-cm improvements surfaced by this work: aggregate-API Mapping/type exports and the PEP 695 JsonValue alias.

🤖 Generated with Claude Code

        # Check if this variable already exists and is valid
        if not force_overwrite and store_exists:
-            if utils.validate_existing_band_data(existing_dataset, var, ds):
+            assert existing_dataset is not None  # guaranteed by store_exists


    for var in ds.data_vars:
        if hasattr(ds[var].data, "chunks"):
            current_chunks = ds[var].chunks
+            assert current_chunks is not None  # guaranteed by hasattr(..., "chunks")


codecov-commenter · 2026-06-19T13:15:09Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 33.07927% with 439 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/eopf_geozarr/data_api/s1.py	8.35%	384 Missing ⚠️
src/eopf_geozarr/conversion/geozarr.py	37.93%	18 Missing ⚠️
src/eopf_geozarr/s2_optimization/s2_multiscale.py	78.26%	10 Missing ⚠️
src/eopf_geozarr/s2_optimization/s2_converter.py	81.81%	6 Missing ⚠️
src/eopf_geozarr/data_api/geozarr/v2.py	50.00%	4 Missing ⚠️
src/eopf_geozarr/data_api/geozarr/v3.py	33.33%	4 Missing ⚠️
src/eopf_geozarr/pyz/common.py	0.00%	4 Missing ⚠️
src/eopf_geozarr/conversion/fs_utils.py	88.23%	2 Missing ⚠️
.../eopf_geozarr/conversion/sentinel1_reprojection.py	75.00%	2 Missing ⚠️
src/eopf_geozarr/data_api/geozarr/store.py	0.00%	2 Missing ⚠️
... and 3 more

📢 Thoughts on this report? Let us know!

Route multiscales, spatial, and proj convention metadata through a single utils.build_convention_attrs() helper that delegates to zarr_cm.create_many, so zarr-cm validates each convention and emits its CMO. Previously the CMOs were hand-placed and the spatial:/proj:/multiscales keys assembled by hand, which skipped zarr-cm validation (e.g. multiscales' layout>=1 and derived_from=>transform rules) and duplicated key strings. Type the helper precisely with zarr-cm's TypedDicts (SpatialAttrs, GeoProjAttrs, MultiscalesAttrs, MultiConventionAttrs) and a CRSLike Protocol instead of dict[str, Any]. Multiscales data is still produced by the project's MultiscaleMeta model (it also covers the TMS encoding zarr-cm doesn't model), but its CMO and validation now go through zarr-cm. Output is byte-identical to before (verified against the golden-file snapshot tests). Add tests/test_conversion/test_convention_attrs.py covering the helper, including that zarr-cm now rejects an invalid multiscales layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Ignore root-level scratch files (test.py, tmp.json, cli_test.sh) and the machine-local .claude/ session directory so they stop showing in git status. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This reverts commit 4a5b6e4.

zarr-cm's convention TypedDicts now use PEP 728 `extra_items`, which mypy does not support. Switch the type checker to pyright (which does) and upgrade to the zarr-cm main git dep that also ships the supporting fixes: - Mapping-covariant aggregate API + exported JsonType/JsonValue/JsonDict, so build_convention_attrs no longer needs a private `_core` import or a cast. - PEP 695 `type JsonValue` alias, so pydantic resolves MultiscaleGroupAttrs' ConventionMetadataObject field directly (removed the model_rebuild workaround). Toolchain: - pyproject: replace [tool.mypy] with [tool.pyright]; mypy -> pyright dev dep. - .pre-commit-config / CI lint job: run `uv run --frozen pyright`. Type fixes across src/ and tests/ to reach a clean pyright run (0 errors): - remove stale mypy `# type: ignore[...]` comments (pyright-unnecessary); - narrow NotRequired TypedDict access (`.get()` + guard) instead of making keys Required, preserving runtime validation of optional Sentinel-1/2 members; - replace casts on external/untyped data with runtime isinstance/validation; - model construction via `Model.model_validate(...)` instead of `**dict`. No `typing.Any` introduced. Runtime behavior unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

    )

+    # calculate_default_transform sizes the grid, so width and height are populated
+    assert width is not None



+    # calculate_default_transform sizes the grid, so width and height are populated
+    assert width is not None
+    assert height is not None


-CONVENTION_NAME = multiscales_cm.CMO["name"]
-CONVENTION_DESCRIPTION = multiscales_cm.CMO["description"]
+_CONVENTION_NAME = multiscales_cm.CMO.get("name")
+assert _CONVENTION_NAME is not None


+assert _CONVENTION_NAME is not None
+CONVENTION_NAME = _CONVENTION_NAME
+_CONVENTION_DESCRIPTION = multiscales_cm.CMO.get("description")
+assert _CONVENTION_DESCRIPTION is not None


-        expected_uuid = multiscales_cm.CMO["uuid"]
-        if not any(c["uuid"] == expected_uuid for c in value):
+        expected_uuid = multiscales_cm.CMO.get("uuid")
+        assert expected_uuid is not None


            scale_level_data["transform"] = multiscale_transform

        # Add spatial properties
+        assert "spatial_shape" in overview_level  # always populated by the producer above


Three casts asserted a type derived from Any or a union without verifying it. Replace each with an isinstance guard that raises TypeError on violation, so the assumption is enforced at runtime instead of only asserted to the checker: - sentinel1_reprojection: rio.write_crs() returns Any -> verify xr.Dataset. - s2_multiscale: output_group[base_path] is Array | Group -> verify zarr.Group. - s2_multiscale: client.compute() returns Any -> verify distributed.Future. The remaining casts bridge types on data we already validated (model_dump / create_many output), satisfy protocol/TypeVar binding on `self`, or widen a TypedDict for a third-party API — a runtime check there adds no safety. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

d-v-b · 2026-06-26T15:48:20Z

@lhoupert i forgot to tag you for review when I opened this 🤦 lmk if you want to review it, otherwise i am inclined to merge for speed

fix the types

a92b437

github-advanced-security AI found potential problems Jun 19, 2026

View reviewed changes

d-v-b and others added 6 commits June 19, 2026 15:52

chore: gitignore local scratch files and .claude/

4a5b6e4

Ignore root-level scratch files (test.py, tmp.json, cli_test.sh) and the machine-local .claude/ session directory so they stop showing in git status. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Revert "chore: gitignore local scratch files and .claude/"

6e2e145

This reverts commit 4a5b6e4.

fix type checking errors

f1c0277

wire up mypy correctly in pre-commit

931bf78

github-advanced-security AI found potential problems Jun 21, 2026

View reviewed changes

d-v-b changed the title ~~bump zarr conventions~~ bump zarr conventions; migrate type checking to pyright Jun 21, 2026

d-v-b marked this pull request as draft June 21, 2026 19:11

d-v-b and others added 2 commits June 21, 2026 21:14

chore: use latest version of zarr-cm

e75b3b5

d-v-b marked this pull request as ready for review June 21, 2026 19:29

d-v-b mentioned this pull request Jun 23, 2026

feat: Sentinel-3 OLCI L1 EFR → GeoZarr exporter d-v-b/data-model#2

Draft

d-v-b requested a review from lhoupert June 26, 2026 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bump zarr conventions; migrate type checking to pyright#199

bump zarr conventions; migrate type checking to pyright#199
d-v-b wants to merge 9 commits into
EOPF-Explorer:mainfrom
d-v-b:chore/new-conventions-metadata

d-v-b commented Jun 19, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 19, 2026 •

edited

Loading

Uh oh!

d-v-b commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

d-v-b commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. zarr-cm upgrade

2. mypy → pyright migration

Verification

Upstream

Uh oh!

codecov-commenter commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

d-v-b commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

d-v-b commented Jun 19, 2026 •

edited

Loading

codecov-commenter commented Jun 19, 2026 •

edited

Loading